Reduce the delete delay with an exponential wait#4512
Reduce the delete delay with an exponential wait#4512abel-von wants to merge 1 commit intoopencontainers:mainfrom
Conversation
currently in the `killContainer`, after send SigKill, we have to wait at least 100ms and then check if the process exit. it is a little bit long, especially for those pod with many containers in, when `crictl rmp -f` the pod, the delay may even exceed 1s. Signed-off-by: Abel Feng <fshb1988@gmail.com>
|
If the process is going to take 1s to die, then speeding up the rate of sending signals won't make a difference (processes can't ignore Checking whether the process is dead at a constant rate seems like it'll be more consistent on average if the death time is randomly distributed from 0 to a few seconds. I could see an argument that we should check a bit more often at the very start (I expect that process deaths happen more quickly soon after the signal is sent, with a long-ish right tail), but I don't think switching entirely to exponential backoff is the correct approach. |
|
Ideally, for new kernels we should poll on pidfd (kernel marks pidfd readable when the process is terminated). |
|
Sorry, I think maybe I made an unclear description, It is actually because the container could actually exit very quickly (maybe in serveral milliseconds), but we runc has to wait at least 100ms to make sure it is exited. @cyphar |
Agree, but for old kernels, the basic time waiting seems too long. every container should wait at least 100ms to kill, no matter how quick it is exited actually. @kolyshkin |
@abel-von Could you please test whether #4517 could meet your requirement or not in a low load machine? If not, I think If we set the first try after 10ms, it's enough. |
|
@lifubang Yes it is better to use pidfd. I can close this PR I think. |
Currently in the
killContainer, after send SigKill, we have to wait at least 100ms and then check if the process exit. it is a little bit long, especially for those pod with many containers in, whencrictl rmp -fthe pod, the delay may even exceed 1s.Here we changed it to an exponential wait, that the max wait time will be 2 secs(2^11), so the total wait time is approximatly 4s, which I think is in the same level of current 10s(100ms * 100).